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ABSTRACT 

Dusty, star forming galaxies contribute to a bright, currently unresolved cosmic far-infrared 
background. Deep Hersc/2e/-SPIRE images designed to detect and characterize the galaxies 
that comprise this background are highly confused, such that the bulk lies below the classi- 
cal confusion limit. We analyze three fields from the HerMES programme in all three SPIRE 
bands (250, 350, and 500 fim); parameterized galaxy number count models are derived to 
a depth of ~ 2 mJy/beam, approximately 4 times the depth of previous analyses at these 
wavelengths, using a P(D) (probability of deflection) approach for comparison to theoretical 
number count models. Our fits account for 64, 60, and 43 per cent of the far-infrared back- 
ground in the three bands. The number counts are consistent with those based on individually 
detected SPIRE sources, but generally inconsistent with most galaxy number counts models, 
which generically overpredict the number of bright galaxies and are not as steep as the P (D)- 
derived number counts. Clear evidence is found for a break in the slope of the differential 
number counts at low flux densities. Systematic effects in the P(D) analysis are explored. We 
find that the effects of clustering have a small impact on the data, and the largest identified 
systematic error arises from uncertainties in the SPIRE beam. 

Key words: Submillimeter Galaxies - Cosmology: Observations 



1 INTRODUCTION 

The cosmic far-infrared background (hereafter CFIRB) provides 
unique information on the history of energy injection in the Uni- 
verse by both star formation and active galactic nuclei. First de- 
tected by the COBE satellite jPuget et al.|T996"l|Fixsen et al.|1998| >, 
the CFIRB contains a large amount of energy, indicating that the 
total luminosity from thermal dust emission is comparable to the 
integrated UV/optical energy output of galaxies (Guiderdoni et al. 
[T997l >. 

Galaxy surveys, both from the ground (with SCUBA, 
LABOCA, Bolocam, AzTEC, and MAMBO at 850 pm, 870 fim, 
1.1 mm, 1.1 mm, and 1.3 mm, respectively) and from space us- 
ing IRAS (at 12, 25, 60 and 100 pm), ISO (at 15, 90, and 170 
pm), and Spitzer (at 3.6 to 160 fim), found high number counts 
compared to non-evolving galaxy number counts models. This im- 
plied that strong evolution of the source populations must have oc- 
curred, challenging contemporary galaxy evolution models jSaun-| 
|ders|1990[ [Scott et al.|2002"{|Lagache et al.|2003| >. Deeper number 
counts test galaxy formation models more severley. By stacking 
Spitzer MIPS 24 pm sources, at least 80% of the CFIRB was re- 
solved at 70 pm and 65% at 160 pm (Dole et al.(20 06 ; Bethe rmin et| 
|al.|2010a) . A small fraction (10-20%) has been resolved in the sub- 
millimeter in blind sky surveys from ground-based observatories, 
but it is possible to go deeper by taking advantage of gravitational 
lensing. At 850 pm this approach has resolved 60% or more of the 
background in small fields ( Smail et al. 2002 , Ze mcov et al.|2010| >. 

A P(D) - probability of deflection - analysis of Bolocam 
observations of the Lockman Hole (Maloney et al. 2005) demon- 
strated that a fluctuation analysis can provide more stringent con- 
straints on source number counts than those derived by extract- 
ing individual sources, for which the threshold must be set high 
enough to ensure a minority of false detections. P (D) techniques 
were first developed for application to radio observations ( Scheuer 
|1957[ >, but have since been widely applied to other regimes. P (D) 
was used to account for the majority of the X-ray background long 
prior to the availability of sufficiently deep imaging to resolve in- 
dividual sources (Barcons 1994), to extend deep infrared counts 
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(Olive r et aT]|1997| l, and in the sub-mm to SCUBA (Hughes et 
al.|1998| >, LABOCA flWeifl et al.|2009l >, and AzTEC ( |Scott et al. 



2010[ > data. The depth of a P(D) analysis is set by the flux den- 
sity at which the number of sources per beam becomes large. The 
resulting contribution to the P (D) becomes that of a Poissonian dis- 
tribution with a large mean, which becomes difficult to distinguish 
from the nearly-Gaussian instrumental noise. An often-used rule of 
thumb for the maximum depth is one source per beam, but the pre- 
cise limit depends on the survey area, the shape of the underlying 
counts, and how precisely the instrumental noise is known. In prac- 
tice, for rapidly rising source counts at faint fluxes, this is consider- 
ably deeper than the limits for a source-extraction approach. Fluc- 
tuation analyses are well-suited to determination of source number 
counts in the case where the dynamic range of detected sources is 
not large because of confusion. Deep number counts are interest- 
ing because they allow us to measure the sources responsible for 
the bulk of the CFIRB, and because they probe intrinsically fainter 
galaxies which may have better matching counterparts in the local 
Universe. 

Recently, a P(D) analysis was performed on 250, 350, and 
500 pm observations of a 10 deg 2 field (GOODS-S) with a 0.8 deg 2 
deep inner region from the balloon-borne BLAST telescope, using 
duplicate SPIRE detector technology (Patanchon et al. 2009 here- 
after P09). Differential number counts were estimated down to 20, 
15, and 10 mJy in the three bands, respectively. Below these thresh- 
olds, upper limits were provided. Combined with 24 pm observa- 
tions, [Deylin|eraL|J2009l concluded that a large fraction (> 1 /2) of 
the CFIRB comes from galaxies with z > 1.2. Also from BLAST 
observations, Marsden et al. (2009) concluded that 24 yum-selected 
galaxies can account for the entire CFIRB based on a stacking anal- 
ysis. These results confirm that fluctuation and stacking analyses 
have substantial power in elucidating the sources of the CFIRB. 
Such techniques will also be necessary for SPIRE observations be- 
cause galaxy models predict that at the confusion limit SPIRE is 
expected to resolve only a small fraction of the CFIRB ( Lag ache et| 
|al.|2003|[Fernandez-Conde et al.|2 008 ). A recent source extraction- 
based analysis of the SPIRE Science Demonstration Phase (SDP) 
data - the same data used in this paper - directly resolved 15, 10, 
and 6 per cent of the CFIRB at 250, 350, and 500 pm, respectively 
( jOliver et al.|2010| ). At shorter wavelengths, |Berta| p0"l0| > directly 
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resolved 52 and 45 per cent of the CFIRB at 100 and 160 yum using 
Herschel-PACS SDP data. 



2 DATA 

The observations used in this analysis were obtained with the 
SPIRE ins trument (|Griffin et al.|2 010 l on the Herschel Space Ob- 
servatory (Pilbratt et al ,|2010 1 as part of the HerMES programing] 
(Oliver et al. 2010, in prep) during the SDP. SPIRE observes simul- 
taneously in three passbands: 250, 350 and 500 fim. The on-orbit 
beam sizes, including the effects of the scanning strategy, are 18.1, 
25.2, and 36.6 arcseconds, respectively, with mean ellipticities of 
7 - 12%. The calibration is based on observations of Neptune, and 
is described in Swin yard et al.| ([2010 ). Observations of five fields 
were obtained during SDP, but only three are used in this analysis: 
GOODS-N, Lockman-North, and Lockman-SWIRE. Their proper- 
ties are summarized in table[T] The Lockman-North region are con- 
tained within the shallower Lockman-SWIRE field. The HerMES 
SDP fields omitted from this analysis are: FLS, which was left out 
because it is the same depth as the much-larger Lockman-SWIRE 
field and is significantly contaminated by infrared cirrus, and Abell 
2218, because the strong lensing in this field complicates the inter- 
pretation of the background number counts. 

The detector (bolometer) timelines were processed using the 
standard SPIRE pipeline, which detects cosmic rays and removes 
instrumental signatures and temperature drifts 1 Powel l et al.|2010| >. 
The maps were produced using the SMAP package (Levenson et 
al. 2010, in prep) using 1/3 beam-FWHM pixels (6, 8 1/3, and 
12 arcsec); this is a compromise between adequately sampling 
the beam and maintaining even coverage over the map. Samples 
flagged as contaminated by cosmic-rays were excluded. Each map 
was masked to form an even-coverage region and was mean sub- 
tracted. In addition to the even-coverage mask, a small amount of 
additional masking was required, as there are five resolved sources 
in the Lockman-SWIRE field. These sources are relatively bright, 
but are not the brightest in the field. Since the P (D) formalism is 
based on unresolved point sources, we mask these objects with a 
2 arcmin circular mask, and then correct our final number counts 
using the measured flux of each excluded source. Our instrumental 
noise estimates are based on the technique of Ng uyen et al.| ( |20lTJ| ), 
and are assumed to have 5% uncertainty, which represents only the 
uncertainty for a fixed calibration. The overall SPIRE calibration 
error is discussed in § |5.1| The resulting pixel flux density his- 
tograms are shown in figure[T| 

Smoothing the maps by the beam (via cross-correlation) is 
beneficial for finding isolated point sources. For BLAST obser- 
vations, [Chapm^eFlir] j20Kj) show that the standard point-source- 
optimized filter should be modified in the presence of confusion 
noise. However, there is no guarantee this will benefit a P (D) anal- 
ysis. Smoothing the map has the effect of broadening the effective 
beam, which decreases the depth that the P (D) can probe, while 
also reducing (but correlating) the instrumental noise. We empiri- 
cally determined if smoothing is beneficial for our analysis by fit- 
ting simulated data with and without smoothing and comparing the 
scatter in the recovered model parameters to the error estimates, 
and found that it helps for all but our deepest map (GOODS-N); 
note that our GOODS-N data are several times deeper (relative to 
the confusion noise) than the deepest BLAST observations. It is 
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Figure 1. Histogram of pixel flux densities for the three fields considered 
in this paper in 5 mjy bins. The Lockman-SWIRE field is considerably 
shallower than the others, but it is the only field large enough to probe the 
bright end of the source distribution. The maps have been mean subtracted. 
The binning here is for display purposes and does not correspond to the 
binning used in the actual analysis, which is much finer. In all cases the 
pixel histograms show clear non-Gaussianity despite the Gaussian nature of 
the noise, indicating that a significant point-source contribution is present. 



likely that some amount of spatial de-convolution would be benefi- 
cial for the GOODS-N field, but since this would significantly com- 
plicate the instrumental noise properties of our maps, and hence re- 
quire extensive testing, we do not pursue de-convolution here, but 
do beam-smooth the two shallower fields. In addition, we have ap- 
plied a 6 arcmin high-pass spatial filter to our maps to reduce the 
effects of clustering on our results. The motivation for this is dis- 
cussed in § |3.3| 

A P(D) analysis is critically dependent on measurements of 
the effective beam (which includes the effects of the map making 
and reduction, as well as any smoothing applied). Our beam map 
is based on in-flight fine-scan (highly oversampled) observations of 
Neptune with a large number of repeats and small offsets between 
each scan. These observations allowed us to measure the beam with 
finer resolution than our data maps to properly match the SPIRE 
calibration (which is timestream rather than map based), and to 
build beam maps for the individual bolometers. We corrected these 
observations for the relative motion of Neptune during the scans 
using the HORIZONSj^jephemeris computation service at the orbit 
of Herschel. The Neptune observations are deep enough that the 
third Airy ring is clearly detected for the array- averaged beams. As 
discussed later (§ |5.1fr , beam uncertainties are our largest identified 
systematic. 



3 METHOD 

We first describe the basic P (D) framework, then discuss our par- 
ticular implementations and the filtering we have applied to limit 
the effects of clustering. 



3.1 Description of P (D) 

If dN/ dS (S ) is the differential number counts per solid angle and 5 
the flux density, then the mean number density of sources per unit 
solid angle with observed flux densities between x and x + dx is 



R (x) dx ■■ 



dN 
dS 



g) 



b~ l d£ldx 



(1) 
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Table 1. HerMES SPIRE Science Demonstration Phase observations used in this paper 



Field 


Size 


RA 


Dec 


Scan Rate 


Repeats 


O"250 


O"350 


O"500 




deg 2 


deg 2 


deg 2 


arcsec/s 




mJy/beam 


mJy/beam 


mJy/beam 


GOODS-N 


0.29 


189.23 


62.24 


30 


30 


1.77 


1.59 


1.89 


Lockman-North 


0.41 


161.50 


59.02 


30 


7 


3.58 


3.16 


4.41 


Lockman-SWIRE 


13.6 


162.0 


58.11 


60 


2 


9.47 


8.47 


11.99 



The <x values for each band and field are the instrumental noise per pixel before any filtering or smoothing is 
applied. The confusion noise (the signal in this analysis) is ~ 6 mJy/beam in all bands (Ngu yen et al.|2010) . 



where b is the beam function (not necessarily peak normalized). 
Ignoring clustering, the probability distribution of sources is Pois- 
sonian. The probability distribution function (pdf) for the observed 
flux in each sky area unit (usually a map pixel) is the convolution of 
the pdfs for each flux interval over all fluxes; this quantity is called 
the P{D). Rewriting the above in terms of characteristic functions 
and denoting the inverse Fourier transform by F" 1 , 



P(D) 



exp 



R (x) exp (iajx) dx 



r 



R (x) dx 



(2) 



The mean of the P (D) is 

fi = J xRdx = J bdQ. J S 



dN/dS(S) dS, 



and the variance is 



I h2dQ I 



S 2 dN/dS(S) dS. 



For real observations, the instrumental noise contribution must also 
be included. Our observations are not sensitive to the mean flux in 
the maps. Therefore, it is useful to subtract off the mean of the P (D) 
during construction. Only in the case of very simple models for 
dN IdS combined with trivial beams is it possible to compute P (D) 
analytically - an example is given in Scheuer ( 19571, but even this 
is only valid for a restricted range of parameters. For an effective 
beam that is not strictly positive (due to filtering, for example), the 
P(D) is the convolution of the individual P(D)s for the positive 
beam and for the negative beam (P09). Azimuthally averaging the 
beam does not preserve the P (D), so it is necessary to use the full 
2D beam map. 

The log likelihood (log £.) of a dataset relative to a particular 
model is given (to within an irrelevant normalizing constant) by 



log£ = Y J logP(.D i ), 



where £), is the value of the f* pixel. Usually it is more convenient 
to bin the data. As long as the individual bins are small compared to 
the width of the P (D), the two formulations are practically equiva- 
lent. Then 



\og£ = Y i hlogP(x k ), 



(3) 



where h k is the number of pixels in the histogram bin centred at 
flux density x k . The alternative of using the x~ as the fit statistic 
under-weights bins with a small number of pixels in them because 
the uncertainty in such a bin is not well modeled by sfh^, and is not 
recommended. 

This treatment assumes that different pixels are uncorrelated, 
which is not true unless the beam is much smaller than a pixel. 
A source at one location will affect neighbouring values over an 
area about equal to the area of the beam. The result is that, if ap- 
plied naively, fits based on the above likelihood will underestimate 



the model errors. Properly treating this effect requires developing 
the P (D) formalism in terms of multi-variate Poisson distributions, 
which is computationally infeasible. P09 recommend dividing the 
likelihood by the beam area in pixels (£, i-> -C/A b ) in order to cor- 
rect for this effect, which amounts to approximately correcting the 
likelihood for the number of independent samples in the map. This 
is an ad hoc approach, but in the absence of a better alternative, we 
have adopted a similar method. However, based on Monte-Carlo 
simulations of synthetic data sets, we find that this correction fac- 
tor is overly conservative, as discussed below. 

Another approximation in the above treatment which is not 
valid for real data is that the sources are not Poisson distributed due 
to clustering. Our approach to this effect is described in § |3.3| 

3.2 Implementation 

We have developed two independent P (D) analysis packages and 
checked them against each other. Given the large number of pa- 
rameters and the non-linear nature of the problem, both make use 
of Markov Chain Monte Carlo (MCMC) methods. Overviews of 
MCMC methods can be found in |MacKay|p002^ ; |Lew"is & Bridle| 
l |2002) ; |Dunkley et al.|(2005) . The important aspects of an MCMC 
implementation are the burn-in criterion and the proposal density. 
The burn-in criterion is the rule used to determine whether the fit 
has converged on the region of maximum likelihood. Once the fit 
has converged, subsequent steps are drawn from the posterior prob- 
ability of the model given the data, and only these steps are used to 
measure the errors and values of the parameters. The proposal den- 
sity is used to propose the next step in the Markov Chain from the 
current step. Any proposal density that can visit all valid parame- 
ters is correct, but a well chosen density can dramatically improve 
the efficiency of the fitting procedure. 

The first code is written in IDL0and the burn-in criterion is 
based on the power spectrum of the single chain (Dunkley et al. 
2005 ). The first chain step within A log £, = 2 of the best-fit pa- 
rameters is taken as the start of the converged sampling. A Fisher- 
matrix approximation to the P (D) fit is used for the proposal den- 
sity. This code interpolates in log space from a moderate number 
(~ 80) of flux densities to calcluate the P{D). The other code is 
written in C++, and is explicitly parallel. It uses the Gelman-Rubin 
criterion for burn-in ( |Gelman ~& Rubin 1992), which is based on 
computing the variance between chains and directly provides the 
point of convergence. This code does not use interpolation when 
computing the P (£>), but supports a more limited range of models. 
The proposal density is a multi-variate Gaussian estimated from 
the previous fit steps, and is frozen in at burn-in. We have checked 
these codes against each other on simulated data, and find good 
agreement. 



3 Interactive Data Language: http://www.ittvis.com/ProductServices/IDL.aspx 
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Our P (D) methodology is almost identical to that described in 
P09 except as follows. First, P09 explicitly fit for the mean of pixel 
values in the map (ji). Since we can analytically predict the mean of 
the P (D) for a given set of model parameters, we simply shift the 
mean to zero explicitly during construction. The input map is also 
mean-subtracted, and the uncertainty in this subtraction contributes 
negligibly to our error budget. Second, P09 fit to the instrument 
noise explicity for each field except for the deepest section of their 
map. Instead, we marginalize over the noise for all fields in our full 
fits, but use the measurements of Nguyen et al. ( 20101 as a prior, 
assuming an Gaussian uncertainty of 5%. At low flux densities, the 
number of sources per beam is large, and hence the contribution 
to the P(D) is almost Gaussian. Therefore, the values of dN/dS 
for the faintest flux densities probed and the noise level are nearly 
degenerate, and hence fixing the noise will tend to under-estimate 
the uncertainties in the model parameters at the faint end. 

We have developed a simple simulation framework to test 
these codes and their sensitivity to various effects such as 1 // noise. 
As inputs we consider two types of catalogs that should be repre- 
sentative of the sub-mm sky: the P09 models, and the simulations 
of |Fernandez-Conde et aT1p008) . The fits to the P09 models are 
easier to compare with the inputs, but the[Fernandez-Cond e et al.] 
(2008 1 models include clustering effects. 

A fake sky is generated from the input catalogue, and scanned 
using the pointing information from the actual SPIRE observations. 
Different noise levels (white and 1 //) can be specified. These data 
are then run through the same map-making pipeline as the real 
data. In addition to the simulated science data, we also simulate 
observations of Neptune using the same framework to determine 
the beams we use when fitting the simulated data. These simula- 
tions use simple Gaussian beams with FWHMs similar to those 
measured on-orbit, and account for characteristics of the data in- 
troduced by the mapping pipeline, but do not simulate errors in the 
lower-level SPIRE pipeline (pointing errors, crosstalk-corrections, 
etc.). We use them to quantify the effects of 1// noise, uneven cov- 
erage, clustering, and smoothing by the beam on our maps. The 
SPIRE 1 // knee frequency is a few mHz, corresponding to a spa- 
tial scale of approximately 3 degrees for a scan speed of 30 arcsec- 
onds per second, and our map-making algorithm reduces this al- 
ready small amount as discussed in Levenson et al. (2010, in prep). 
We find that the remaining amount, as well as the uneven coverage, 
introduces negligible bias in our fits, but that clustering can have 
measureable effects on our largest maps, as discussed in § |3.3| 

In addition, we have determined the appropriate correction 
for pixel-pixel correlations using the same framework and a large 
number of simulated HerMES datasets. We find that the correct 
normalization factor varies with signal-to-noise ratio of the map, 
and whether it has been additionally smoothed with the beam. If 
the map is beam-smoothed, then the beam area factor is approxi- 
mately correct, if slightly conservative for deeper fields; note that 
all of the maps in P09 were beam-smoothed. However, for deep, 
unsmoothed maps, this procedure clearly overestimates the uncer- 
tainties (by about a factor of 2 for GOODS-N). Rather than de- 
rive individual correction factors for each field, we have taken the 
more conservative approach of finding the largest correction factor 
(which therefore increases the uncertainties the most) for all of our 
fields, and applying it to all un-smoothed data. For the GOODS-N 
and Lockman-Norfh observations, the correct normalization factor 
(without smoothing) is less than A b /3. Because we do not have 
an exact formulation for this correction, we conservatively adopt 
2A b /5. For the smoothed observations, we adopt the A b normaliza- 
tion, also conservatively; this means that the two Lockman fields 



have the same correction factors, but the (un-smoothed) GOODS- 
N field has a different one. 

3.3 Filtering 

Clustering will affect the P (D) distribution in two ways. First, the 
presence of clustering implies sample variance effects, so that the 
SDP fields may not be representative of the all-sky number counts. 
Second, the fact that the underlying counts are not Poisson dis- 
tributed would change the shape of the P (D) distribution even if 
we were somehow lucky enough to select a precisely average re- 
gion of sky. This effect can be modeled if all of the «-point statis- 
tics of the source distribution are known (Barcons 1992j). The ef- 
fect on the width of the P (D) is discussed in Appendix A of P09, 
although clustering is not purely limited to changing the width of 
the distribution. Only the 2-point function has been measured for 
the population sampled by SPIRE, and even this is not known at 
the flux densities important for our results. The first issue is dis- 
cussed in § |5.1[ and the second here. There are two effects: clus- 
tering on small scales between individual SMGs, and clustering on 
larger scales between groups of SMGs. 

The framework for the clustering contribution to the P (D) is 
given in Barcons (1992); Takeuchi & Ishii (2004). The contribu- 
tion to the w th moment is proportional to J P„ (k) b (k)" d 2 k, where 
P„ (k) is the power spectrum of the rc-point correlation function, 
and b (k) is the Fourier transform of the beam, b falls rapidly with 
|k| (e.g., an 18 arcsecond FWHM Gaussian beam has a 1/e value at 
k = 1 .2 arcmin -1 , and the higher powers fall off even more rapidly). 
Thus, small scale clustering, which is implied by the measurements 
of, e.g., |Blain et al.|p0 04), is filtered out by the beam on scales of 
less than about one to two arcminutes in our data. 

Generically, P„ falls rapidly with |k|, suggesting that high-pass 
filtering the maps may mitigate large-scale clustering effects. In 
particular, in the far-IR the power spectrum of the two point cor- 
relation function (P 2 ) shows excess power above Poissonian noise 
at scales larger than 10' JLagache et al.||2007| [Viero et al.||2009| 
|Cooray et al.|2010) . In order to reduce ringing, our filter consists 
of a high-pass filter with a turn on at 6 arcmin convolved with a 
<r = 1.8 arcmin Gaussian. Only the Lockman-SWIRE field is large 
enough to be significantly affected because the other fields are not 
much larger than this scale. Since the benefit of P (D) analyses is at 
faint flux densities where most of the CFIRB arises, and the shal- 
low Lockman-SWIRE field has little constraining power here, our 
main scientific results are minimally affected by non-Poissonian 
clustering effects even if we ignore them. In fact, we find that the 
differences between fits to simulated data with and without cluster- 
ing are well within the statistical errors even without filtering. 

Analysis of simulated data from Fernand ez-Conde et al.| 
(2008), which has linear clustering based on the assumption that 
infrared galaxies are tracers of dark-matter fluctuations, shows that 
a high-pass filter is quite effective at removing clustering signal 
for this data set. We construct two sets of simulated maps: one with 
clustering, and another using the same catalogue but with clustering 
removed by randomizing the source positions. We then compare 
fits and pixel histograms for both maps. Because filtering will af- 
fect the P (D) even in the absence of clustering, comparing these to 
unclustered, unfiltered maps is not useful. The fits recover the input 
model accurately in both cases, whereas if we do not filter dN/dS 
is slightly underestimated at low flux densities for the largest maps. 
Smaller maps show no evidence for bias. A pixel histogram from 
such a simulation is shown in figure [2] Such filtering is also effec- 
tive at removing infrared cirrus, although we have not tested this 
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Figure 2. Comparison of the pixel histogram for simulated Lockman- 
SWIRE data with and without clustering, in the absence of high-pass filter- 
ing (left) and with a high-pass filter applied as described in the text (right). 
The top panel shows the resulting pixel histograms, and the bottom shows 
the fractional difference between the clustered and unclustered pixel his- 
tograms. The thick line in the bottom panel shows the average difference 
smoothed over a 3 mjy scale. Without filtering, there is a clear trend in 
the fractional difference, but with filtering the difference is consistent with 
zero for the bulk of the pixel fluxes. Even without filtering, the effect on the 
measured counts is smaller than the statistical errors. 



explicitly in terms of the recovered fit parameters. However, it is 
possible that clustering signal on scales between one and six ar- 
cminutes could affect our results. This regime is currently not well 
characterized and thus we could not model it quantitatively in our 
analysis. 



4 MODEL 

The best approach for comparing a particular model to SPIRE data 
using P(D) is to generate pixel histograms as a function of the 
model parameters and compare directly with our data. However, not 
all models have smoothly adjustable parameters, and furthermore, 
if the model is a poor fit to the data this may provide little insight 
as to at which flux densities the model disagrees with observations. 
Hence, we have followed P09 and fit simple, non-physical paramet- 
ric models to our data. These models are defined by the values of 
the differential number counts dN/dS at a set of fixed flux densities 
(knots). Observationally we can never do more than place a lower 
limit on the total number of sources fainter than 5, N(< S) be- 
cause we can never measure all the way down to zero flux density, 
but dN IdS is better behaved because it only depends on the num- 
ber of sources in some small range. The actual fit parameters are 
log 10 dN IdS at the knot positions. The differential number counts 
must become shallower than S~ 2 at low flux densities or else the 
contribution to the CFIRB diverges. However, this turn-over may 
lie below the flux densities probed by our data. Therefore, in order 
to avoid biasing our fits by excluding models which are too steep 
within the range of our measurements (i.e., would overpredict the 
CFIRB if integrated down to zero flux density), we assume that the 
number counts outside the largest and smallest knot are zero; the 
problem of choosing the limits is discussed below. 

A P (D) fit requires that the number counts model is continu- 
ous. Therefore, we must choose a method of interpolating between 



the knots, and for a finite number of knots, the interpretation of 
our results depends on the interpolation method. We consider two 
methods of interpolation in this paper: first, as in P09, using power 
law extrapolation between each knot (these are multiply-broken 
power-law fits), and second, using a cubic spline in log-log space. 
The first code supports both methods, and the second only the for- 
mer. We do not expect the fit parameters (i.e., dN/dS at the knot 
positions) of these models to be identical, since they have different 
meaning. 

It is important to understand that the results of this paper are 
model fits. The fit results are not simply dN IdS at the flux densi- 
ties of the knots, but instead are effectively integral constraints over 
some region surrounding each knot. Any excursion in the number 
counts that lies entirely between two knots will affect at least both 
neighboring knots, and likely others as well. The flux density range 
that each fit parameter is sensitive to depends on the interpolation 
scheme, with the spline response more local to the knot. Therefore, 
simply reading off the values predicted by a theoretical or empiri- 
cal number counts model at the knot positions and comparing that 
with our fit parameters is wrong since they are integral constraints 
over a region surrounding the knot. This is also true for more tradi- 
tional methods (i.e., simple number counts derived from individual 
galaxy detections) because of the importance of the de-boosting 
corrections for low signal-to-noise ratio detections. A preferable 
approach is to first find the best approximation to the differential 
counts of the theoretical (or empirical) number counts model cho- 
sen for comparison using either of our parametric models (for ex- 
ample, by fitting a multiply-broken power law to the dN/dS of the 
theoretical model giving equal weighting to all fluxes, not just the 
values at the knots) and comparing the parameters of that approxi- 
mation to our results. 

The highest and lowest knot positions must be chosen with 
some care because the differential number counts are assumed to 
be zero outside this range. Our criterion is based on examining the 
effects of cutting the number counts at a given level on a selection 
of galaxy evolution models from the literature. We compare the pre- 
dicted P (D) for each literature model truncated below a specified 
flux level with the P (D) without truncation, and find that our data 
are not sensitive to a cut-off of less than 0. 1 mjy at 250 pm. A sim- 
ilar analysis shows that truncating the fit above 1 Jy is also unde- 
tectable, with similar values for the other passbands. From simula- 
tions, we find that we can obtain good constraints if the second knot 
lies approximately at the l<x instrumental noise. Because the num- 
ber counts below our flux limit are unlikely to be well described 
by a single knot all the way down to 0. 1 mjy, the fit value for this 
point should be treated with care; simulations indicate that this is 
not a problem for the 2 mjy knot. In order to avoid over-tuning 
our fits to represent literature models, we adopt approximately log- 
arithmically spaced knots between these extremes. The choice of 
the number of knots is somewhat arbitrary. Neighboring knots are 
very strongly correlated, and as the number increases the correla- 
tions increase. We have tried to chose the number of knots to be as 
large as possible while keeping the correlations reasonably small. 



5 RESULTS 

We fit all three fields simultanously, but each band independently. 
The uncertainty in the instrumental noise is modeled as a single 
multiplicative factor having a Gaussian prior with a = 5%. Note 
that we are making the assumption that the timestream instrumen- 
tal noise is the same for all three fields as found in Nguy erTet al.| 
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Table 3. Differential Number Counts Constraints For a Multiply-Broken 
Power-Law Model with the FIRAS prior 



Table 5. Differential Number Counts Constraints For a Spline Model with 
the FIRAS prior 
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Marginalized fit parameters for a multiply-broken power-law model from 
a joint analysis of all three fields including the FIRAS CFIRB prior of 
Fixsen et al. 1 1998 1. Quoted uncertainties are 68.3% confidence intervals, 
except for the first knot where la- upper limits are given. The systematic 
uncertainties are the same as in table|2] 



< |20 10| >. In addition to the SPIRE data, we also explore the effects 
of including the FIRAS CFIRB prior l |Fixsen et al.|1998| > by inte- 
grating SdN/dS for our model down to the lowest knot and adding 
a term to the likelihood that compares that value with the FIRAS 
measurement and its error. This assumes that the CFIRB is entirely 
due to discrete sources, and that flux densities outside the range of 
our model contribute only negligibly. We integrate the Fixsen et al. 
( 19981 spectrum through the SPIRE passbands and adopt the rel- 
ative errors given in Marsden et al. ( 2009 ). The uncertainty in the 
relative calibration between FIRAS and SPIRE significantly affects 
the utility of this prior. 

The best fit multiply broken power-law fit is compared with 
the GOODS-N data in figure[3] and the parameters are given in ta- 
bles [2] and [3] and for the spline interpolation fits in tables [4] and 
[5] The correlations between adjacent knots are large and negativ^] 
with typical correlation coefficients of -0.5 to -0.8. The two mod- 
els are compared with each other in figure]?] The two interpolating 
models (spline and multiply-broken power-law) produce very sim- 
ilar results. As discussed previously, these are model fits, not inde- 
pendent number counts, and since the parameterizations differ, di- 
rectly comparing the values at the knot positions is not entirely cor- 
rect. Nonetheless, the agreement is clear. Also, because the models 
were fit to the same data their results should not be coadded: they 
are both presented to demonstrate that similar results are obtained 
with using independent codes. 

Since the agreement is so good, we express no preference 
of one model versus the other (multiply-broken power-law versus 
spline). However, we note that the spline model has a narrower flux 
density window function about each knot, and thus represents the 
differential number counts of the knot position slightly more accu- 
rately locally than the power-law model. For comparison to other 
number counts models, one can either (i) select the fits with the FI- 
RAS prior, which assumes that the remaining portion of the CFIRB 
unaccounted for by our priorless fits is encompassed in the range 



Covariance matrices are provided at http://www.hermes.sussex.ac.uk/ 



Marginalized fit parameters for a spline interpolation model from a joint 
analysis of all three fields including the FIRAS CFIRB prior of |Fixsen et| 
|al.H1998) . The systematic uncertainties are the same as in table[4] 

between the upper limit on dN/ dS at 0. 1 mJy and the 2 mJy knot (at 
250 /jm) - this method is simpler; or (ii) select the fits without the 
CFIRB prior, in which case the prior should be applied indepen- 
dently. The latter does not require that the model share the same 
assumptions about the number counts at low flux densities as our 
fits. 

Our fits are compared with other measurements in figures [5] 
and [6] Ignoring the lowest knot (where only an upper limit is 
available), our fits predict a CFIRB flux density of 0.54 ± 0.08, 
0.39 ± 0.06, and 0.16 ± 0.03 MJy sr 1 from all sources down to 2 
mJy in the three bands; the dominant error in all cases is due to the 
15 per cent calibration uncertainty of SPIRE. The contribution from 
each flux range is shown in figure]?] The CFIRB from Fixs en et al.| 
(1998) integrated over the SPIRE bands is 0.85 ± 0.19, 0.65 ± 0.19, 
and 0.39 ± 0. 10 MJy sr~' , respectively, so our fits therefore account 
for 64 ± 16, 60 ± 20 and 43 ± 12 percent in the SPIRE 250, 350, and 
500 yum bands, respectively. We expect to resolve a smaller fraction 
of the CFIRB at longer wavelengths because the size of the SPIRE 
beam is proportional to wavelength, and hence the 500 fim band is 
more confused. Here the errors are dominated by the uncertainty in 
the FIRAS measurement. We find marginalized values for the in- 
strumental noise that are 1.02, 1.1, and 1.01 times the values given 
in table[T|at 250, 350, and 500 fim, respectively, giving a/ of 4.2 
for 3 degrees of freedom. Hence, our instrumental noise values are 
consistent with the Ng uyen et al.| (2010l prior. 



5.1 Systematic Effects 

Our basic tool for estimating the importance of a particular system- 
atic is to compute the A log £, between the P (D) with and without 
the effect for maps the same size and depth as our data. We use 
the P09 best fit model as a basis for this computation. Recall that a 
AlogX of 0.5 corresponds roughly to a lcr statistical error. 

Because different parts of the map are sampled by different 
bolometers, and the beam shape varies across the bolometer ar- 
ray, the effective beam will vary over the map. We evaluated this 
effect by choosing 200 random pixels in our maps and comput- 
ing the fractional contribution of each bolometer to each pixel. We 
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Table 2. Differential Number Count Constraints For a Multiply-Broken Power- 
Law Model 
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Marginalized fit parameters for a multiply-broken power-law model from a joint 
analysis of all three fields without using the FIRAS CFIRB prior. The quoted 
uncertainties are the 68.3% confidence intervals for the statistical error followed 
by the estimated systematic uncertainty, except for the first knot where the lcr 
upper limit is given. 
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Figure 3. Comparison of the GOODS-N pixel histograms (solid lines) to the best fit model to all three fields (dashed 
lines) using the multiply-broken power law fit and not including the FIRAS prior. 



Table 4. Differential Number Counts Constraints For a Spline Model 
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Marginalized fit parameters for a spline model from a joint analysis of all three 
fields. Quoted uncertainties are as in table[2] 



HerMES P (D) Fluctuation Analysis 9 




Figure 4. Comparison of the multiply-broken power-law (solid lines) and spline (dashed lines) P (D) fits for the 
differential number counts to the three SDP fields simultaneously, without the FIRAS prior; l<x upper limits are 
shown as arrows. For this comparison, only statistical errors are shown. 
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Figure 5. Comparison of Euclidean-normalized multiply-broken power law differential number counts fits (solid 
lines and circles) with previous balloon-based measurements from BLAST, not using the FIRAS prior. The BLAST 
P(D) analysis (P09) is shown as dashed lines and stars, the stacking analysis of Bethermin et al. (20 10b I as triangles 
and the source extraction analysis from the same reference as squares. Here the combined statistical and systematic 
errors are shown. 



100F 




Flux Density [mjy] 

Figure 7. The contribution to the cosmic far-infrared background from each 
flux range for the multiply-broken power-law model versus the log of the 
flux density. The results for the spline interpolation model are almost iden- 
tical. The integral of the curve over the log flux density is proportional to 
the total flux contribution in each band. 



then built per-bolometer maps from our Neptune observations, and 
combined these to find the effective beam at each of these locations. 
The beam varies across the map in a complicated fashion because 



even in our deepest map each pixel only samples a limited sub- 
set of bolometers. This produces significant variation in the P (D) 
with position. In general, the P(D) computed for the bolometer- 
averaged beam does not have to be the same as the P (D) computed 
for each bolometer and then averaged across the array (which is 
the P(D) of the entire map). To evaluate the importance of this 
variation, we compare the P (D) for the average beam to the P (D) 
computed for all 200 pixels and then averaged. The A log £, of this 
comparison is < 0.01, so for our analysis this is negligible. 

Although we masked each map to exclude the low-coverage 
edges, the HerMES SDP observations were not dithered between 
repeats so there are significant variations in the number of measure- 
ments per pixel even within the high-coverage regions (~ 20%). 
This will introduce slight non-Gaussian tails to the instrument noise 
distribution. We simulated this effect in two ways. First, we gen- 
erated random realizations of the instrument noise, including the 
uneven coverage, and compared the P (D) using the resulting noise 
to the P (D) assuming the noise is purely described by the average 
<r, and found negligible A log £, . Second, our end-to-end simula- 
tions (also including 1 // noise) implicitly include uneven coverage 
effects, and we found no bias in the recovered parameters. Future 
HerMES observations will include dithering, which will also have 
the benefit of improving the homogeneity of the beam. 

[Nguyen et al. (2010) explore the noise characteristics of the 
SDP maps by carrying out 'Jack-Knife' tests on the data. Their 
findings are generally consistent with the expected noise properties, 
but it is difficult to rule out some additional level of non-Gaussian 
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Figure 6. Comparison of Euclidean normalized SPIRE P (D) differential number counts (solid lines/circles and 
dashed lines/squares for the multiply-broken power law and spline models, respectively, not using the FIRAS prior 
in both cases) with other SPIRE number counts: first, an analysis of the same data using source extraction techniques 
{Oliver et al.|2010| red stars), and second the H- ATLAS source extraction on an independent field I Clements et al. 
|2010[ triangles). The eiTors are the combined statistical and systematic errors. 



noise beyond the 1// behavior we have simulated. Directly com- 
puting the effects of the Jack- Knife noise histograms on our model 
shows that any additional non-Gaussianity has negligible effects for 
the SDP data, down to our lowest constrained knot (2 mjy). This 
may not be true for future observations where the larger field sizes 
will reduce the statistical error considerably. 

To test the sensitivity to the beam model, we use an alternative 
set of Neptune observations with a much smaller number of repeats 
and coarser sampling. Furthermore, the pointing of these observa- 
tions is not corrected for the small offset between the Herschel and 
SPIRE clocks, and hence they suffer from pointing drift relative to 
the maps of the science field^] The pipeline nominally corrects for 
this offset. However, to be conservative, we allow for the possib- 
lity that the pointing drift might not affect the beam maps in the 
same way as the science maps: we repeat the fits using the alter- 
native beam, and use the difference in the results as an estimate of 
the beam systematic. This is the dominant identified systematic ef- 
fect, with A£ ~ 0.3. We take the differences between the recovered 
parameters between the two beams as the systematic error on each 
knot as given in tables[2]and|4] As our understanding of the SPIRE 
beams improves, it should be possible to decrease this error. 

To further explore issues of pointing drift, we have constructed 
a simple drift model for the GOODS-N field using Jack- Knife com- 
parisons. We then generate simulated maps with and without apply- 
ing this model. Because the effect of the model is largely to twist 
successive observations relative to each other, this has very little 
effect on the P(D), amounting to ~ 0. lcr relative to the statistical 
errors. 

While our results should represent the number counts in our 
fields quite well, sample variance means that they may not perfectly 
represent the number counts we would obtain with an infinitely 
large field. If we make the strong assumptions that the SPIRE clus- 
tering properties measured in |Cooray et al.| (j2010 l apply equally at 
all flux densities (and, in particular, to depths 10 times greater than 
they were measured), that the redshift distribution of our sources is 
independent of measured brightness, and that the source population 
peaks at z = 1.5, then a simple analytic computation suggests that 
sample variance could be a 20 per cent effect on the total number 

5 The SPIRE clock speed differs by a very small amount from the Herschel 
clock, resulting in a cumulative pointing drift with time in SPIRE maps. The 
magnitude of the effect is 0.7 arcseconds per hour, with rephasing occuring 
when "PCAL" internal calibrations are made. 
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Figure 8. Comparison of 250 /im differential number counts derived from 
the three fields for the multiply-broken power-law model. Only statistical 
errors are shown and the FIRAS prior is not included. 



counts in GOODS-N. More empirically, if we split the Lockman- 
SWIRE field into sub-regions there is evidence for variation in the 
pixel histogram from sub-field to sub-field. However, after applying 
our high-pass filter the variations are no longer statistically sign- 
ficant; that is, the difference between sub-fields lies in the mean 
rather than the shape of the histograms. Alternatively, mean sub- 
tracting each sub-field produces the same effect. This suggests that 
the fact that our measurements are not sensitive to the mean map 
values (and therefore are mean-subtracted) may provide some pro- 
tection against sample variance effects. Since estimating the size of 
this effect is highly model dependent, we have not attempted to in- 
clude this in our error budget. We do, however, fit the three fields 
independently at 250 and compare the results, although the dif- 
ferent depths complicate this somewhat. For simplicity, we do not 
marginalize over the instrument noise in this fit, since the only pur- 
pose is to compare the three fields. This is shown in figure[8] Within 
the uncertainties, the fits are consistent. The post-SDP HerMES ob- 
servations will allow sample variance effects to be better quantified. 

We do not include the effects of the SPIRE calibration uncer- 
tainty (~ 15% across all bands) in our error budget, except when 
using the CFIRB prior or computing the fraction of the FIRAS mea- 
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surement accounted for by our model. However, any updates to the 
SPIRE calibration are easily incorporated into our results without 
refitting: if the flux scale is multiplied by a factor a the knot posi- 
tions Kj i-» aKj and the knot values decrease by - log 10 a. 



6 DISCUSSION 

In general, our results agree well with those of P09, except for the 
faintest fluxes fully constrained by their analysis. For example, at 
250 yum they find log lo dN/dS = 5.58!°^ at 20 mjy, while our 
result is 5.139+J}q33 ± 0.025. However, as discussed earlier, they did 
not marginalize over the instrumental noise for their deepest field, 
so their errors may be somewhat underestimated here. There is also 
some evidence from simulated data that the small number of knots 
and knot placement right at the break in the number counts may 
have biased this knot in the P09 analysis. We find good agreement 
with the stacking analysis of the BLAST data, but see some mild 
disagreement at higher fluxes for direct counts of the same data 
( |Bethermin et al.|2010b} as shown in figure[5] 

The deepest number counts available at these wavelengths are 
the result of a semi-traditional source extraction method on the 
same HerMES data set ( |01iver et al.fl 2010). These are compared 
in figure [6] Where there is overlap, the agreement is good. A simi- 
lar analysis was carried out using H-ATLAS SDP data by Clements 
|et al.| (|2010i; this is also shown. Unlike the HerMES source extrac- 
tion and P (D) analysis, the H-ATLAS counts are 250 yum-selected 
at all wavelengths, and hence may not entirely probe the same point 
source population. Nonetheless, again the agreement with the Her- 
MES results is good. 

A few features are worth noting. First, we clearly detect a 
break in the number counts around at 20 mjy in all bands at 
high significance. However, the SPIRE data alone do not detect 
the change in slope in dN/dS necessary to keep the CFIRB fi- 
nite, as the differential counts continue to rise to the lowest limit 
of our analysis more steeply than S~ 2 . When the FIRAS prior is 
added, a break is present, but this mostly affects the lowest flux 
knot, for which we can only provide an upper limit (the effects on 
the other knots are mostly due to the strong correlations between 
knots; the FIRAS prior changes the structure of the correlations 
significantly at low flux densities). Second, there is possible weak 
(~ lcr) evidence for a 'bump' in the differential counts around 400 
mjy at 250 yum. There is no evidence at 350 and 500 around 
this flux density, but the error bars are large. However, this bump is 
present in the independent H-ATLAS field (Clements et al. 2010) 
at 250 yum. The cause is unclear; lensing is an intriguing possibility, 
but we would expect the signature of lensing to be larger at 500 yum 
due to stronger negative A"-correction effects (e.g. |Negrello et al.| 
|2007l >. 

We can compare our model fits to the confusion noise esti- 
mates of N guyen et al.|p010[ > measured using a different technique. 
The often-used criterion of one source per every n beams is difficult 
to use, since its translation into the effects on observations depends 
strongly on the underlying model (Takeuchi & Ishii 2004). Hence, 
we adopt the square root of the variance of the source contribution 
to the pixel distribution (cr con f) as our measure. We find values of 
6.5/6.4/6.1 ± 0.2 mjy in the three bands, slightly higher than the 
Nguyen et al. (2010) values, but by less than 2cr. 

Our fits are compared with a selection of literature models in 
figure [5] No currently available model entirely fits our counts, es- 
pecially when all three bands are considered. There is a general 
discrepancy in the galaxy number count models in that the theoret- 



ical models generically overpredict the number of bright galaxies 
(in the several x 10 to several x 100 mJy range, limited at the up- 
per end by uncertainties in the P (D) number counts) compared to 
the number counts from the P (D) analysis. The best match overall 
across all three SPIRE bands is given by the model of Valiante et 
al. (2009). 

We interpret the discrepancy in the context of the theoreti- 
cal models of Lagac he"et al.|(|2003} and [Fernandez-Cond e et al.| 
( p008> (figure |9j. In Fi gure^Jthe redshifts and FIR luminosities 
of galaxies are plotted versus their observed flux densities for the 
|Fernan dez-Conde et al. ( 2008 ) simulations. The transitions from lu- 
minous to ultra-luminous infra-red galaxies (LIRGs to ULIRGs, at 
10 12 L o ) with increasing observed flux densities occurs at approxi- 
mately 12, 6, and 3 mjy, in the 250, 350, and 500 yum SPIRE bands, 
respectively. Thus, the discrepancy at the bright end likely results 
from the presence of too many ULIRGs in the theoretical models. 
It should be noted that the intrinsic luminosities of the underly- 
ing galaxy population that contribute to any given bin in observed 
flux density depend on the redshift distributions and spectral energy 
distributions; however, the very brightest galaxies are likely either 
intrisically extremely luminous (ULIRGs or brighter), low redshift, 
or strongly lensed. There is a large dispersion in redshifts repre- 
sented by galaxies in each observed flux density bin, with means in 
the range z = 1 - 2, with the average flux densities only mildly in- 
versely correlated with redshifts (due to the negative ^-correction). 

In all three bands at and below 2 mjy, the P (£>)-derived num- 
ber counts are consistent with the theoretical galaxy number count 
models. This is not surprising because: (i) the theoretical galaxy 
number count models are constrained not to overpredict the CFIRB, 
which arises, in large part, from numerous faint galaxies; and (ii) 
the upper limits of the lowest flux density knot in each band lies 
well above the theoretical number counts models. 

Another more subtle feature also seems apparent in the mea- 
sured counts. The results from both fitting methods - lending some 
confidence to their credence - have depressions at the third-lowest 
flux density knots with respect to the theoretical number count 
models at low-to-moderate significance (depending on the theoreti- 
cal model), which are exclusively concave down in this range (a few 
mjy to a few times 10 mjy). The stacking analysis of Be thermin| 
|et al.| ( |2010a| ) is not deep enough at 250 or 350 yum to verify this 
feature, although the turnover in S 2S dN/dS from the peak at ap- 
proximately 10 mjy downward is clear at 250 yum. At 500 yum, the 
stacking analysis does not display the depression. This flux density 
range is approximately at the confusion limit (where the flux den- 
sity is equal to the confusion noise, cr COI1 f = 6 mjy) and multiple 
galaxies contribute to the flux density in each beam. Thus, refer- 
ring to the Fernandez-Conde et al. (2008) models (figure [T0|, this 
flux density range corresponds to the transition from ULIRGS to 
LIRGs, suggesting that LIRG number counts may also be overrep- 
resented by the theoretical galaxy number counts models. 



7 CONCLUSIONS 

We have measured the differential galaxy number counts from 
Herschel-SPIRE Science Demonstration Phase HerMES observa- 
tions at 250, 350, and 500 yum using P (D) techniques and two sim- 
ple parametric models. The number counts were measured down to 
2 mjy, approximately a factor of 3 below the 1 a confusion noise. 
We find that 64 ± 14 per cent of the measured CFIRB is accounted 
for by point sources at 250 yum , falling to 43 ± 12 per cent at 500 
yum. The errors on the fraction of the CFIRB accounted for by these 
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Figure 9. Comparison of our Euclidean normalized differential number counts fits (as in figure|6] and not using the 
FIRAS prior) to a selection of models from the literature. The error bars are the combined statistical and systematic 
errors. 



250 Jim 350 urn 500 urn 




1 10 100 1 10 100 1 10 100 

Flux Density [mJy] Flux Density [mJy] Flux Density [mJy] 



Figure 10. The intrinsic redshifts (upper panels) and infrared luminosities 
(lower panels) of galaxies as a function of the observed flux densities in 
the SPIRE 250, 350, and 500 fim bands (left, middle, and right panels, re- 
spectively) for the Lagache et al. models (Lagache et al. (2003); Fernandez- 
Conde et al. (2008)). The error bars give the interquartile range for each bin. 



sources are now dominated by those of the FIRAS measurement. 
However, because of the remaining fraction not accounted for by 
our fits, this is still not a competitive method for measuring the to- 
tal CFIRB. We find clear evidence of breaks in the slope of the dif- 
ferential number counts at approximately 10-20 mJy in all bands, 
which have been hinted at by previous analyses. 

Where they overlap, our fits agree well with other Herschel re- 
sults. Comparing with a selection of literature models, however, we 
find that no model entirely reproduces our observed number counts. 
As found by |Qliver et al.| ( |2010^ and |Clements et al.|{2Tjl0] >, most 
published models significantly over-predict the number of bright 
sources at these wavelengths and have shallower slopes. We find 
somewhat better agreement at fainter fluxes, at or below the break, 
but the agreement is still not perfect. 

Our main systematic uncertanties arise from our understand- 
ing of the SPIRE beams. We find that a high-pass filter is effective 



in removing the signature of clustering from our counts, but in the 
future it may be preferable to attempt to directly marginalize over 
clustering using simple models. 

These observations represent only ~ 60 hours of the 900 hours 
of observations that HerMES will ultimately obtain (although not 
all of these are with SPIRE). The final dataset will cover a wide 
range of depths and areas. This will significantly increase our abil- 
ity to constrain dN/dS . Having a number of well-seperated deep 
fields will also allow a direct measurement of sample variance. 
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